Syntax-based Semi-Supervised Named Entity Tagging
نویسندگان
چکیده
We report an empirical study on the role of syntactic features in building a semisupervised named entity (NE) tagger. Our study addresses two questions: What types of syntactic features are suitable for extracting potential NEs to train a classifier in a semi-supervised setting? How good is the resulting NE classifier on testing instances dissimilar from its training data? Our study shows that constituency and dependency parsing constraints are both suitable features to extract NEs and train the classifier. Moreover, the classifier showed significant accuracy improvement when constituency features are combined with new dependency feature. Furthermore, the degradation in accuracy on unfamiliar test cases is low, suggesting that the trained classifier generalizes well.
منابع مشابه
Scientific Information Extraction with Semi-supervised Neural Tagging
This paper addresses the problem of extracting keyphrases from scientific articles and categorizing them as corresponding to a task, process, or material. We cast the problem as sequence tagging and introduce semi-supervised methods to a neural tagging model, which builds on recent advances in named entity recognition. Since annotated training data is scarce in this domain, we introduce a graph...
متن کاملGraph-based Semi-supervised Gene Mention Tagging
The rapidly growing biomedical literature has been a challenging target for natural language processing algorithms. One of the tasks these algorithms focus on is called named entity recognition (NER), often employed to tag gene mentions. Here we describe a new approach for this task, an approach that uses graphbased semi-supervised learning to train a Conditional Random Field (CRF) model. Bench...
متن کاملChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Named entity recognition (NER) plays an important role in the NLP literature. The traditional methods tend to employ large annotated corpus to achieve a high performance. Different with many semi-supervised learning models for NER task, in this paper, we employ the graph-based semi-supervised learning (GBSSL) method to utilize the freely available unlabeled data. The experiment shows that the u...
متن کاملPractical Named Entity Tagging using Co-training
3] and [1] opened the possibility of using an unlabeled corpus through co-training, a semi-supervised learning algorithm, to classify named entities. Our approach to solve the problem of Korean named entity classification also adopted a co-training method called DL-CoTrain. However, we use only a part-of-speech tagger and a simple noun phrase chunker instead of a full parser to extract the cont...
متن کاملSemi-supervised part-of-speech tagging in speech applications
When no training or adaptation data is available, semisupervised training is a good alternative for processing new domains. We perform Bayesian training of a part-of-speech (POS) tagger from unannotated text and a dictionary of possible tags for each word. We complement that method with supervised prediction of possible tags for out-of-vocabulary words and study the impact of both semi-supervis...
متن کامل